Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement gMSA for Windows upstream kubernetes tests #2208

Closed
wants to merge 2 commits into from

Conversation

jsturtevant
Copy link
Contributor

What type of PR is this?
/kind feature

What this PR does / why we need it:
This added the need infrastructure and setup required to test gMSA for windows. See details in #1860.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1860

Special notes for your reviewer:

This requires one time set up in the subscription using the script provided scripts/gmsa/gmsa-setup.sh which configures some required managed identities and access to key-vault resources. This requires special privileges (Microsoft.Authorization/roleassignments/write) to configure access.

Other requirements:

Initially tried to implement this with out introducing a new template but felt it would be closer to a customer experience if we had a new template. I left the commits for now but can re-base once we get through reviews.

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests

Release note:

NONE

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 30, 2022
@jsturtevant jsturtevant force-pushed the gmsa branch 6 times, most recently from 4eb2c61 to 2f3c4cb Compare March 31, 2022 18:51
@jsturtevant
Copy link
Contributor Author

Set up in Azure sub is complete
/test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

@jsturtevant
Copy link
Contributor Author

serial slow test passed and the gMSA tests ran and passed 🥳

{"msg":"PASSED [sig-windows] [Feature:Windows] GMSA Full [Serial] [Slow] GMSA support works end to end"
{"msg":"PASSED [sig-windows] [Feature:Windows] GMSA Full [Serial] [Slow] GMSA support can read and write file to remote SMB folder"

@jsturtevant jsturtevant marked this pull request as ready for review April 1, 2022 02:46
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2022
@k8s-ci-robot k8s-ci-robot requested a review from devigned April 1, 2022 02:46
@jsturtevant jsturtevant force-pushed the gmsa branch 3 times, most recently from c73ea82 to 5d9765f Compare April 1, 2022 16:33
@jsturtevant
Copy link
Contributor Author

/test pull-cluster-api-provider-azure-windows-containerd-upstream-with-ci-artifacts-serial-slow

@jsturtevant
Copy link
Contributor Author

gMSA tests passed again.

/assign @CecileRobertMichon @marosset

@marosset
Copy link
Contributor

marosset commented Apr 1, 2022

It would be nice if we could figure out a way to not need new templates for to run the gmsa tests.

@jsturtevant
Copy link
Contributor Author

It would be nice if we could figure out a way to not need new templates for to run the gmsa tests.

In the pr description I called this out. The first commit in the PR demonstrates how this would look without an additional template in the gmsa.go file.

I moved to the template approach for couple reasons:

  • it simplified some of the setup of the identities and vnet peering that is done in the after cluster creation section. Basically the only part that is done there now is the stuff required for the k/k tests nothing for gmsa to work.
  • it adds additional coverage to managed identities we don't cover in any other tests today
  • it is cluster to what a customer setup would look like

I could go either way but lean toward having the additional template for the above reasons.

@jackfrancis
Copy link
Contributor

Not to throw a wrench in current progress, but can the gMSA install pre-reqs be implemented as a helm chart?

@jsturtevant
Copy link
Contributor Author

Not to throw a wrench in current progress, but can the gMSA install pre-reqs be implemented as a helm chart?

The only component that needs to be installed for the implementation here is the ccg keyvault plugin which is install on the VM via image-builder. The webhook is installed at test run time so not needed here.

There is a WIP for a chart if a customer wanted to use it, otherwise it would need to be installed seperately. The ccg plugin is not a part of the chart.

@jackfrancis
Copy link
Contributor

@jsturtevant sounds like wrapping all of the needful capz stuff into a helm chart for installation via test CI is not worth the effort...

docs/book/src/developers/development.md Outdated Show resolved Hide resolved
scripts/gmsa/ci-gmsa.sh Outdated Show resolved Hide resolved
scripts/gmsa/ci-gmsa.sh Outdated Show resolved Hide resolved
scripts/gmsa/ci-gmsa.sh Outdated Show resolved Hide resolved
scripts/gmsa/ci-gmsa.sh Outdated Show resolved Hide resolved
scripts/gmsa/ci-gmsa.sh Outdated Show resolved Hide resolved
test/e2e/gmsa.go Outdated Show resolved Hide resolved
keyVaultClient.Authorizer = keyvaultAuthorizer

// Wait for the Cluster nodes to be ready (this is different than capi's ready as cni needs to finish initializing)
windowsCalico := &appsv1.DaemonSet{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is assuming calico CNI, would we ever want it run with other CNIs? Would it be equivalent to check the nodes are all "Ready"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this before I learned about: ControlPlaneWaiters: clusterctl.ControlPlaneWaiters{ Maybe that would be a better approach? It might help with some flakes that happen before the Windows pods are fully ready too.

GMSA_DOMAIN_ENVSUBST="${REPO_ROOT}/scripts/gmsa/domain.init"
GMSA_DOMAIN_FILE="${REPO_ROOT}/scripts/gmsa/domain.init.tmpl"
$ENVSUBST < "$GMSA_DOMAIN_FILE" > "$GMSA_DOMAIN_ENVSUBST"
az vm create -l "$AZURE_LOCATION" -g "$GMSA_NODE_RG" -n "$vmname" \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you consider using the go sdk to run these prerequisites directly in the test suite instead of using the az cli in a script? similar to what we do for private cluster custom vnet setup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did, having the creation of the domain outside the test suite made it so it could be used across test entry point scripts or by someone outside the project for their testing. This still requires a few additional set before being to run tests so maybe bringing it in to test suite would be fine now.

The other aspect of this, is that testing the domain creation was cumbersome and having it out side the test suite made it easier to iterate on without having to modify the rest of the tests to be skipped why working on it.

I listed a bunch ideas in #1860 (comment)

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from cecilerobertmichon after the PR has been reviewed.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

spec:
identity: UserAssigned
userAssignedIdentities:
- providerID: "/subscriptions/${AZURE_SUBSCRIPTION_ID}/resourceGroups/${CI_RG}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/cloud-provider-user-identity"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't this require the fix in #2214 to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had similar question but after discussing with @mboersma we figured out that since we are not testing any cloudprovider specific flows in this set of tests it doesn't exercise the code that would cause issues.

"${REPO_ROOT}/hack/log/redact.sh" || true
}

trap cleanup EXIT

if [[ "${WINDOWS}" == "true" ]]; then
if [[ $KUBETEST_WINDOWS_CONFIG =~ "windows-serial-slow" ]]; then
Copy link
Contributor

@marosset marosset Apr 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this bit documented anywhere?
I can image someone having a bad day trying to troubleshoot why GMSA tests aren't running correctly somewhere in PROW only to find this conditional...
Maybe we could at least add a log line like 'Skipping GMSA configuration' if we aren't performing the config to help debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't. We currently run the GMSA tests in the serial/slow jobs. There isn't a strict requirement for this and in fact the tests are pretty fast, just to initial set up of the cluster and domain is slow.

I used this because i didn't want to introduce yet another ENV but maybe it would be better to have it as additional setup? It would make your suggestion and debugging simpler.

@jsturtevant
Copy link
Contributor Author

jsturtevant commented Apr 27, 2022

Had discussion in slack and zoom to discuss the following in bit more in detail:

  1. should we create a new CI template or do configuration purely in code (Implement gMSA for Windows upstream kubernetes tests  #2208 (comment))
  2. should the creation of the domain vm be done in the code of e2e.test vs script (Implement gMSA for Windows upstream kubernetes tests  #2208 (comment))

Summary was:

  1. using a CI template to do the work for vm identities, vnet peering and such is the best approach but was big discussion as to if this should live in the capz repo (including other e2e tests)
  2. Along the same lines for this issue we discussed if the scripts to create the domain vms and do the set up should really live inside capz repo. General agree is this might not be the best place for them.

Also discussed how to make this as re-usable as possible for other providers but so much of the setup is azure specific there may not be much.

One possible idea was to move some of the testing templates and scripts to live in the sig-windows gmsa repo along with maybe using the outcome of az capi cli updates to create the clusters:

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 4, 2022
@k8s-ci-robot
Copy link
Contributor

@jsturtevant: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jsturtevant
Copy link
Contributor Author

most of this work has been moved to kubernetes-sigs/windows-testing#328

/close

@k8s-ci-robot
Copy link
Contributor

@jsturtevant: Closed this PR.

In response to this:

most of this work has been moved to kubernetes-sigs/windows-testing#328

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for running Windows gMSA tests
5 participants